
                        CHAPTER  5.


                        PHONEMICS AND VOICES.




          DECtalk PC PHONEMIC INPUT

          This chapter describes the phonemic (sound) system of English
          used  in DECtalk PC and  the ways to control DECtalk PC's
          pronunciation.


          DECtalk PC represents the state-of-the-art in text-to-speech
          synthesis. The software shipped with it is the latest in
          DECtalk speech microcode. It contains a number of significant
          improvements over its predecessors. It will contain fewer
          pronunciation errors and will handle text in a more
          sophisticated way. It shold also sound more natural and
          intelligible. Naturalness in synthesized speech is a continuum
          which is evolving slowly because of the inherent complexity of
          accurately replicating human speech as well as the difficulty of
          adequately defining what naturalness itself means. However, the
          developer will find that the use of phonemic transcription
          should become less necessary with this added sophistication.


          PHONEMIC TRANSCRIPTION
          
          Most users do not need to know anything about DECtalk PC
          phonemic  input and may never need to use the phonemic alphabet.
          This is because improvements made in text-to-speech technology
          in the past few years make it unnecessary to have to modify an
          incorrect pronunciation of a normal word.  On the other hand,
          many developers will want to enter unusual words in the
          definable dictionary or, for various reasons, modify the sound
          of the synthesized speech, perhaps to attain a higher degree of
          naturalness, to demonstrate emotion,  or to  emphasize a particular
          word or phrase. In these cases, it may be helpful to understand
          in a bit more detail how DECtalk PC works.

          To understand how the DECtalk PC system works and to  make
          DECtalk PC correctly pronounce any English word, you may wish to
          know something about speech sounds and how to represent them on
          a  keyboard. Because spelling in English does not always show
          exactly  how words are pronounced, dictionaries use symbols to
          show how  words really sound. Sometimes these symbols are the
          same as  letters used in spelling. A word written the way it is
          pronounced is said to be in  phonemic transcription or simply in
          phonemics.



          PRONUNCIATION ERRORS

          When DECtalk PC says a word or phrase incorrectly, you may need
          to  use phonemic input to get the desired pronunciation. The
          following  list suggests the most common types of errors that
          DECtalk PC makes,  and the best corrective action.
          Note: Prior to using phonemic transcription or clever
          misspellings, ascertain that DECtalk does indeed mispronounce
          the word.  in the vast majority of cases, the word will be
          pronounced correctly. You will need to turn phonemic mode on
          with a special command (above).

                  DECtalk PC mispronounces a proper name
                          Lee Iacocca.

                          Corrective action: Convert to phonemic form.
                          Lee [ayaxk'owkax].

                          Or misspell in a clever way:
                          Lee Eye a Coke a.

                  DECtalk PC mispronounces an acronym.
                          The UN building.

                          Corrective action: Respell with spaces between
                          the letters:
                          U N

                          Or use phonemics.
                          ['yuw 'ehn]

                  DECtalk PC mispronounces an unfamiliar word:
                          articulatory.

                          Corrective action: Convert to phonemic form.
                          [aart'ihkyaxlaxtowriy].




                   DECtalk PC guesses incorrectly for an ambiguously pro
                   nounced word:


                                  The insert
                                  Get the lead out.

                           Corrective action:  Convert to phonemic form

                           The ['ihnsrrt]
                           Get the [l'ehd] out.
                  
                  DECtalk PC uses the wrong syntactic classification of a
                  preposition or  particle.

                          He takes on tough jobs.   ("He does tough jobs"
                          versus "He accepts graft when on tough jobs.")
                          
                          Corrective action: Add a stress phoneme when
                          needed.


                           He takes [']on tough jobs.

                           Or convert to phonemic form:

                           He takes ['aan] tough jobs.

                  DECtalk PC uses the wrong phrasing:

                          Following a long gasp shouts were heard.

                          Corrective action: Add commas or a verb-phrase
                          introducer phoneme where needed.

                          Following a long gasp, shouts were heard.


                  INTRODUCTION TO PHONEMIC THEORY.

           At one time long ago, English was pronounced as it was spelled,
          with each letter (or pair of letters) representing one sound.
          Because of historical sound changes such as the dropping  of
          sounds like the  gh of "bought" or the    k of  knight and word
          borrowing from other  languages, English pronunciation rules
          have become complex and  include many exceptions. For example,
          of is pronounced with a  v sound, while all other English words
          spelled with f  are pronounced with an    f  sound. The  vowel
          sequence ea can be pronounced in at least a half-dozen  ways, as
          illustrated by the sounds in the words  cheap, head,  earth, and
          idea. The letters th can be pronounced with a  voiceless phoneme
          as in thin, or with a voiced phonemeas in   this; or the  th can
          represent the t  phoneme followed by the  h phoneme in compound
          words such as pothole.

          Some words have two pronunciations, for example,  read.  Correct
          pronunciation of a sentence such as Will you read the book or
          have you read it already? requires an understanding of the
          meaning of the sentence - a task which DECtalk is  learning to
          do. DECtalk can often correctly predict which of the alternate
          pronunciations is correct in a given context. However, because
          of the nature of language, it occasionally makes a mistake. If
          this occurs, you can get the alternate pronunciation in two
          ways.


                  By misspelling the word, e.g., "red" for "read"

                   By phonemic spelling: [r'ehd].


                  Ex. Will you read the book or have you [r'ehd] it
                      already?


          Stress is an important part of phonemic representation. Stress
          alone distinguishes the two different pronunciations of words
          like  "insert."

          English words usually have one syllable that is spoken with more
          stress than the other syllables in the word. You can indicate
          this  primary stress to DECtalk PC by placing the phonemic
          symbol [']  before the vowel. The ['] symbol is described below.
           For example, the word "insert" can be spoken as a noun:


                  "insert" = ['ihnsrrt].

           and as a verb:
                   
                   "insert" =  [ixns'rrt].



          Considering the complexity of English pronunciation rules and
          the  number of exceptions, it is not surprising that DECtalk
          occasionally  makes such pronunciation errors. You can adjust
          DECtalk pronunciation  through a large number of symbols,
          described in the rest of this  chapter.  Again, DECtalk V4.0
          has improved pronunciation rules and, as a result, such phonemic
          intervention will only occasionally be needed.


          PHONEMES.

           A phoneme is the smallest unit of speech that distinguishes one
          word from another. Of all the sounds that human beings can
          produce, relatively few are significant in any one language.
          Only  about 40 different functional sound types or phonemes are
          used in General  American English.

           The phonemes of English are not pronounced the same by every
          speaker. We all know people who pronounce some words differently
          from the way we do, yet we understand them. The differences may
          occur because we come from different parts of the country.
          Because  of these variations, there is no such thing as a universal
          standard pronunciation of American English. DECtalk PC attempts
          to mimic a  Midwestern (Northern Milwaukee) dialect.
           Because DECtalk PC pronounces a phoneme in a standard
          rule-governed  way, it is not possible to imitate all other
          English dialects  (although you can approximate some dialectal
          differences by  phonemic spelling).

           The following sections describe the vowel and consonant
          phonemes,  stress and syntactic symbols, and optional direct
          control of  intonation or singing.


                   VOWEL AND CONSONANT PHONEMES.

           Linguists have identified about 17 vowel phonemes and 24
          consonant  phonemes for American English.  Tongue position (high
          versus low in the mouth, and front versus  back of the mouth)
          correlates with the frequencies of the two  lowest natural
          resonances of the vocal tract. The lowest resonance  frequency,
          is the  first formant F1 and the second formant  is F2.
          Consonant phonemes are typically described by their places of
          major articulatory  constrictions and the manners of forming the
          constrictions.

          Appendix B lists the  consonant and vowel phonemes of English as
          used by DECtalk. The symbols used for each phoneme are
          identified by a key word with the relevant phonemic sound in
          italics.
                   In many cases, phonemes are indicated by two letters,
          instead of  special characters or diacritic symbols that often
          appear in  dictionaries. DECtalk PC requires a case-insensitive
          representation  (uppercase and lowercase are acceptable)
          although lower case is the more commonly used. The letter pairs
          have  been designed so that it is not necessary to put a space
          between  phonemes of a word. In fact, the space indicates word
          boundaries.  DECtalk PC can parse input phonemic letter
          sequences to determine the  unique phoneme sequence in all
          cases.

                  Phonemes are enclosed in square brackets instead of
          between the  more traditional / symbol. The [ and ] characters
          mark the  beginning and end of phonemic mode clearly with
          distinctively  different symbols. The input format is not
          strictly phonemic  because it also permits you to enter certain
          allophones (variants  of a phoneme), making the representation
          closer to a broad  phonetic transcription. When the command
          [:phoneme arpabet speak on] is given,, all text within square
          brackets is treated as phonemic text.


                   Phonemic Correction the Easy Way.

          Developers may wish to learn the phonemic code. However, you can
          also consult one of the commonly available dictionaries to
          determine the phonemic pronunciation for the occasional word
          that DECtalk gets  wrong.
           For example, according to the Merriam-Webster Dictionary, the
          pronunciation of the word "Mozart" is:

                    \'mot-,sart\

          Using the Table of Appendix B, you can convert this
          transcription to the DECtalk phonemic string:
                  
                  [m'owtsaart].


          The User Dictionary.

          Every time DECtalk PC mispronounces a word in running text, your
          application could replace the text string ("Mozart")  with a
          phonemic string ([m'owtsaart]). However, if the number of words
          requiring phonemic translation in an application is small, it
          might be simpler to download a dictionary to DECtalk PC and let
          DECtalk PC perform the  replacement automatically.  DECtalk PC
          has memory allocated for a loadable dictionary. This dictionary
          is useful in cases where (a) DECtalk makes an error in
          pronounciation, or (b) the pronunciation of a string is unique
          to the application.  For example, if the sequence n/cl should be
          pronounced as not cleared, then a user dictionary entry is
          obviously needed.
                  
                  To download a dictionary to DECtalk PC, you must do the
          following:

          1. Create a dictionary file using an editor with the file extension
          .tab, i.e. personal.tab. The dictionary must be in the following    
	  format:
                  
                  (a) an entry must start at the first character of the
          line Any character other than as the first character of the line 
	  causes the line to be treated as a comment and it will therefore 
	  not be processed.

                  (b) the syntax is grapheme string followed by phoneme.
          string. A line may be 256 characters long but not longer.

                  (c) A grapheme  (letter) string is comprised of legal
          graphemes. Legal graphemes are:
          
          A-Z, a-z,  0-9 and select punctuation marks
          ("!, @, &, (, ), -, \,  and /). These characters may not be used
          at the beginning of the grapheme string. The grapheme string
          may be in either case.  Uppercase letters match only upper
          case; lowercase letters match either uppercase or lowercase.
                  
                  (d) the phoneme string is comprised of legal phonemes.
          Phonemes are always in square
          brackets but may be in either upper or lower case.

          For example,  to make the word coffee be pronounced tea, you
          would enter the following:
          coffee  [t'iy]

          After creating your dictionary file, you can compile and load
          the dictionary by doing the following:
          
          2. Compile the dictionary by typing:

                  user_dic <input dictionary table> 
	Example:	
		  user_dic personal.tab
          
          Input files have the default extension of .tab but can be
          anything. Output dictionary files have the extension of .dtu and
          must  have that extension for the loader to find the file
          correctly. If no output file is specified, a file with the same
          name and .dtu extension will be created for the output.

          For example: if your dictionary table is called mydict.tab,
          type:
                  user_dic personal

          3. Load the user dictionary by typing:
                  dt_load         <output file>

          For example, you would type:
                  dt_load  personal.dtu

          Your customized dictionary is now loaded.
          
          Note: User dictionary lookups are done only on a single form of
          the word. No affix stripping occurs in user dictionary lookups.
          Therefore, inflected and derived forms must be entered
          separately.

          Warning: If your PC is powered down, you must reload the
          dictionary at power-up.
          
          Do not automatically assume that DECtalk will mispronounce a
          word, even a difficult one. DECtalk often correctly guesses at
          the correct pronounciation of even difficult or very complicated
          words. Also, using the [:pronounce name] command, will do a
          creditable job at proper names as well.


          VOWELS.


          While  DECtalk recognizes 17 vowel phonemes, these vowels can
          sometimes  change slightly when surrounded by certain phonemes.
          These  variants are discussed below.
          Vowel Allophones


                  Allophones for  Vowels + [r].
            
            The vowels in words  such as "beer," "bear," "bar," "bore,"
          and "poor" are different  from the available vowel phonemes in
          DECtalk. They require special  vowel- r allophones, which are
          listed below.

                  The Schwa Allophones [ax] and [ix].
            
            Another problem is with the unstressed  reduced vowel called
          "schwa" in English. The vowel appears in  words such as about
          and kisses. In kisses," the vowel is  produced with a higher
          tongue position, symbolized by the vowel  allophone [ix]. You
          can choose between [ax] and [ix]  by noting the characteristics
          of the adjacent phonemes, but  listening to the words will
          result in the best choice.

                  Syllabic Consonants
            
            The final syllable in words such  as "butter," "bottle," and
          "button" is usually symbolized in a  dictionary as consisting of
          a short vowel followed by a consonant.  For better sounding
          synthesis, DECtalk uses a set of syllabic  consonants, [rr],
          [el], and [en]  that are realized  without the short schwa.
          Syllabic "r" shares the same symbol as  the phoneme [rr] in a
          word such as "bird," but this leads to no  confusion inside
          DECtalk PC.

           The [em] allophone used in the earliest version of DECtalk no
          longer  exists and must be replaced by the two-phoneme sequence
          [axm] as  in the word "bottom" = [b'aataxm].
           In most situations, you do not need to be concerned about
          allophones because the vowel phonemes will be  automatically
          converted into the appropriate allophones by DECtalk PC  rules.
          For the developer, allophone selection can be  induced or
          blocked by using the syllable boundary phoneme [-] and  the
          rule- blocking phoneme [~] , or by  inserting allophone symbols
          in the phonemic spelling.


           CONSONANTS.

           The symbols that represent consonants are straightforward. In
          one case, [hx] , the two-letter sequence ensures  unambiguous
          parsing because the letter "h" is part of  some vowel symbols.
           DECtalk PC speaks an English dialect that does not distinguish
          voiced and voiceless w. Therefore, words like "which" and
          "witch"  are pronounced alike as [w'ihch].

           The letter "g" can be pronounced in two ways. In words like
          "gift," the consonant phoneme [g] is used. In words like "gin,"
          the phoneme [jh] is used.
           The letter sequence "th" can be pronounced with a voiceless
          sound  [th] as in "thin" or with a voiced sound [dh] as in
          "this."


          Consonant A      Glottal Stop [q].
            
            The glottal stop [q] is used in some  situations to indicate a
          word boundary, especially when the next  word begins with a
          vowel. Overuse of this symbol can lead to a  stilted style of
          speaking.


          Controlling Allophone Selection.

            DECtalk automatically  inserts certain other allophones for
          [k], [q], and [nx] when  appropriate. It also selects the
          prevoiced and voiceless  unaspirated allophones of [b], [d], and
          [g]. You cannot access  these allophones.
           If DECtalk does not select one of these allophones, you  can
          insert the allophone symbol directly in a phonemic
          representation of the word in question.

           If DECtalk PC uses one of these allophones inappropriately,
          place the  rule- blocking phoneme [~] before the phoneme in
          question to block  application of all allophonic substitution
          rules. For example, to  say "batter" without a flap being
          substituted for the [t], enter  the phonemic string [b'ae~trr].
          Silence Phoneme [_]
           
           DECtalk PC automatically inserts a sile        the [l] in other contexts. For some speakers,  the tongue tip
          may not even reach the roof of the mouth. This  postvocalic
          allophone [lx] is automatically selected by DECtalk PC.


                  Glottal Stop [q].
            
            The glottal stop [q] is used in some  situations to indicate a
          word boundary, especially when the next  word begins with a
          vowel. Overuse of this symbol can lead to a  stilted style of
          speaking.


          Controlling Allophone Selection.

            DECtalk automatically  inserts certain other allophones for
          [k], [q], and [nx] when  appropriate. It also selects the
          prevoiced and voiceless  unaspirated allophones of [b], [d], and
          [g]. You cannot access  these allophones.
           If DECtalk does not select one of these allophones, you  can
          insert the allophone symbol directly in a phonemic
          representation of the word in question.

           If DECtalk PC uses one of these allophones inappropriately,
          place the  rule- blocking phoneme [~] before the phoneme in
          question to block  application of all allophonic substitution
          rules. For example, to  say "batter" without a flap being
          substituted for the [t], enter  the phonemic string [b'ae~trr].
          Silence Phoneme [_]
           
           DECtalk PC automatically inserts a silence (brief pause)
          whenever  punctuation appears in the text. The phonemic silence
          symbol [_]  is useful for controlling silence while in phonemic
          mode. Silences  and other pauses are described in more detail
          below.

          End of Chapter 5, Part A.
